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ABSTRACT  /.  )  h/j  H 
An  affine  invariant  version  of  the  Kantorovich  theorem  for  Newton's 
method  is  presented.  The  result  includes  the  Gragg-Tapia  error  bounds,  as 
well  as  recent  optimal  and  sharper  upper  bounds,  new  optimal  and  sharper  lower 
bounds,  and  new  inequalities  showing  q-quadratic  convergence  all  in  terms  of 
the  usual  majorizing  sequence.  Closed  form  expressions  for  these  bounds  are 
given. 
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SIGNIFICANCE  AND  EXPLANATION 


A  basic  problem  in  numerical  analysis  is  the  computation  of  the  roots  of 

a  nonlinear  system  Fx  =  0,  where  F  s  X  ♦  X  and  X  is  a  space  of  N- 

tuples.  Newton’s  method  consists  of  the  iteration, 

xQ  chosen,  xn  =  xn-1  -  F' (xn-1 )-1Fxn,  n  >  1  , 

where  F'(xn_-|)  is  the  Jacobian  of  F  at  xn_1f  thus  generating  a  sequence 

of  N-tuples  x  ■(£.,  5  «.,•••,£„),  which  will  hopefully  converge  to  a 
n  nl  n2  nN 

solution  x*  =  (Z*  ,  £*  The  basic  idea  of  the  method  is  to  take  each 

12  N 

vector  xn  as  the  solution  of  an  approximating  system  of  N  x  N  linear 
equations.  The  Kantorovich  theorem  gives  specific  conditions  under  which  the 
iterates  xn  will  converge  to  a  solution  x*,  establishing  in  the  process 
the  local  existence  and  uniqueness  of  that  solution,  and  it  also  yields 
computable  upper  and  lower  bounds  for  the  errors  llx  -  x^ll .  For  mathematical 
expediency,  this  famous  theorem  is  often  stated  in  terms  of  operator  equations 
in  Banach  spaces,  but  its  major  application  to  actual  computer  work  is 
restricted  to  finite  systems  of  equations  as  described  above.  Although  the 
theorem  has  been  established  conclusively  in  1948,  there  is  a  continuing 
effort  on  the  part  of  researchers  to  find  the  best  possible  and  sharpest  error 
bounds  under  the  hypotheses  of  the  theorem.  In  practice,  when  guaranteed 
accuracy  is  needed,  error  bounds  provide  exit  criteria,  viz.,  means  of 
stopping  the  computation  when  an  approximant  has  a  prescribed  accuracy.  Thus 
the  sharpness  of  error  bounds  is  important,  since  it  translates  into  saving  of 
computer  time.  This  report  presents  a  complete  update  of  the  theorem,  qiving 
recent  and  new  sharper  error  hounds. 


T^e  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  and  not  with  the  author  of  this  report. 
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AN  UPDATED  VERSION  OF  THE  KANTOROVICH  THEOREM 
FOR  NEWTON'S  METHOD 


George  Miel* 

Introduction.  Kantorovich  [5]  presented  in  1939  a  preliminary 
convergence  result  for  Newton's  method.  In  1948,  he  used  certain  recurrence 
relations  to  establish  his  now-famous  thoerem  [6] ,  and  a  year  later,  he  gave 
the  first  proof  based  on  the  majorant  principle  [7].  Various  workers  have 
presented  refinements  of  the  theorem  and  related  results.  For  a  survey  of  the 
theorem's  predecessors  and  successors  prior  to  1970,  see  [2,  p.  247],  [15,  pp. 
420,  428],  [16,  p.  404]. 

With  the  use  of  the  original  recurrence  relations,  Dennis  [1]  improved 
the  Kantorovich  error  bounds.  Tapia  [22]  derived  these  improved  bounds 
directly  from  Ortega's  majorizing  sequence  [14].  Rail  and  Tapia  [20]  further 
improved  the  bounds.  Under  hypotheses  different  from  the  usual  ones, 

Ostrowski  [16],  [17]  established  optimal  a  priori  upper  bounds.  Gragg  and 
Tapia  [4]  used  the  recurrence  relations  to  get  optimal  a  posteriori  upper  and 
lower  bounds.  Ptak  [19]  applied  his  principle  of  nondiscrete  induction  to 
derive  the  optimal  a  priori  upper  bounds.  With  the  same  principle,  Potra  and 
Ptak  [18]  obtained  a  posteriori  upper  and  lower  bounds  sharper  than  those  of 
Gragg-Tapta.  Miel  [9],  [10]  used  the  majorizing  sequence  to  derive  the  Gragg- 
Tapia  upper  bounds,  as  well  as  new  optimal  and  sharper  upper  bounds.  It  turns 
out  [11]  that  these  new  bounds  are  finer  than  those  of  Potra-Ptak. 
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Although  the  lasting  effort  that  has  gone  into  finding  error  bounds  for 
the  Kantorovich  theorem  is  suggestive  of  the  theorem's  depth  and  its  central 
importance  in  nonlinear  numerical  analysis,  one  cannot  help  yearn  for  a  clean 


and  definitive  statement.  Since  recent  refinements  are  either  scattered  or 

altogether  not  in  the  open  literature,  our  purpose  here  is  to  give  a  complete 

update.  As  it  should  be,  the  updated  theorem  is  affine  invariant  [3],  and  it 

describes  clearly  in  terms  of  the  usual  majorizing  sequence  the  Gragg-Tapia 

bounds,  the  recent  optimal  and  sharper  upper  bounds,  new  optimal  and  finer 

lower  bounds,  and  new  inequalities  showing  q-quadratic  convergence.  Since  the 

elements  of  the  majorizing  sequence  are  Known  in  closed  form,  we  readily  get 

explicit  expressions  for  all  bounds. 

Given  a  sequence  in  a  Banach  space,  if  there  is  a  sequence  of 

real  numbers  {t  }  .  such  that 

n  n=0 

* 

(1.1)  lim  t  *  t  <  «,  lx-x,ll<t-t.  , 

n  n  n-1  n  n-1 

oo  if 

then  {x  }  .  converges  to  some  x  and  the  error  bounds 

n  n=0 

(1.2)  llx*  -  x  I  4  t*  -  t 

n  n 


are  valid  (14],  The  following  simple  result  [10],  given  here  for  complete¬ 
ness,  shows  that  under  certain  conditions,  the  majorizing  sequence  {t  }  _ 

n  n=0 

yields  error  bounds  much  sharper  than  (1.2). 

LEMMA.  If  there  is  a  sequence  {t  of  real  numbers  such  that  (1.1) 

hold  and 


t0  =  °' 


t  ,  <  t  , 
n-1  n 


llx  ,  ,  -  x  t 
n+1  n 


Sh-I  ‘  tn 


(t  -  t  ,) 
n  n-1 


-  #x 
2  n 


x  .1 
n-1 
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then  { x  }  „  converaes  to  some  x  and 

n  n-0 


t  -  t 


(1.3)  llx  -  x  B  < 
n 


,  t  -  t 

n  «x  -  x  ,l2  <  - - r-2-  <X.  -  X_  .1 


,  _  2  n  n-1  t  -  t  ,  n  n-1 

(t  -  t  .)  n  n-1 

n  n-1 


t  -  t 


,xi  -  v  • 


Proof .  If  m  >  n  then 


.m-n 


IVx  I 


117  V  <  Vtm  Hjt 


<  7t 


IVx  I 

n 

Tt~ 


\ 


Thus  for  p  >  1, 


IVx 


lx  ^  —  x  H  <  ( t  -t)  I  ■ 

n+p  n  n+p  n  \  Vt_ 


Take  p  ♦  «  to  obtain  the  first  inequality  in  (1.3).  Finally,  use 


It Vx  ll/Vt  <  IIVx  ll/t,  <1  to  get  the  other  two  inequalities, 
n  n  I" 


Given  Newton  iterates  for  an  operator  equation  Gx  »  0, 


(1.4) 


xn+1  *  xn  ‘  G,(xn)_lGxn  ' 


the  usual  majorizing  sequence  consists  of  the  scalar  Newton  iterates, 


(1.5) 


*  °<  tn+1  ■*  t„  -  g'(tn)_1g(tn)  , 


of  a  quadratic  polynomial  g(t)  whose  coefficients  depend  on  G  and  x0. 
Since  g  satisfies  the  Kantorovich  hyptheses  and  (1.5)  then  is  a  special  case 
of  (1.4)  with  G  =  q  and  x0  =  t0,  the  hounds  in  (1.2)  and  M.3)  are  all 


-3- 


optimal.  K  standard  argument  and  the  hounds  IG'(x  )  'l  <  -g'(t  )  1  yield  a 

n  n 

quadratic  inequality,  (2.5)  in  the  sequel,  from  which  one  gets  lower  bounds 

for  lx  -  x  I  in  terms  of  lx  4  -  x  I.  The  weaker  optimality  of  these 
n  n+1  n 

lower  bounds  is  obtained  as  in  Gragg-Tapia  [4] . 

Warnings  against  the  use  of  majorizing  sequences  are  sometimes  sounded. 
The  reasons  given  are  the  apparent  r-order  of  convergence,  the  coarseness  of 
the  bounds,  and  the  difficulty  in  computing  the  required  constants.  The 
arguments  against  majorizing  sequences  should  perhaps  be  re-evaluated,  since 
the  updated  theorem  shows  that  the  majorizing  sequence  does  imply  q-quadratic 
convergence  and  that  new  error  bounds  are  sharper  than  the  usual  ones.  The 
problems  associated  with  the  local  nature  of  the  estimates  and  the  verifica¬ 
tion  of  hypotheses,  however,  do  remain.  In  this  connection,  we  point  to 
research  on  computer  verification  of  semilocal  conditions  by  interval  analysis 
and  on  interval  versions  of  Newton’s  method  [12],  [13],  [21]. 


2.  The  Updated  Theorem.  Let  X  and  Y  be  Banach  spaces  and  let  n  be 
an  open  convex  subset  of  X.  The  open  ball  {x  :  llx  -  x  II  <  r}  and  its 
closure  are  der.  ited  by  S(xQ,r)  and  S(x0,r)  respectively. 

THEOREM.  Let  F  :  D  X  +  Y  be  Frechet  differentiable.  Assume  that 
F ' ( Xq )  is  invertible  for  some  xQ  z  D,  and  that 

llF'(x0)“1(F'(x)  -  F'(y))«  <  Kllx  -  y  II ,  x,y  £  D  , 

IIF'(x0)"1FxqII  <  a  , 

S(xQ,t  )  c  D,  t  =  (1  -  /I  -  h)/K,  h  =  2Ka  <  1  . 


Consider  the  scalar  iterates  (1.5)  for  the  quadratic  polynomial 
g(t)  =  j  t2  -  t  +  a.  Then 

i)  The  iterates  xn+1  =  xn  -  F*(xn)-1Fxn  exist,  remain  in  S(xQ,t*) 
converge  to  a  root  x*  of  F. 

ii)  The  root  x*  is  unique  in  S(xQ,t**)  n  D,  t**  ( 1  +  /l  h)/K, 

h  <  1,  and  in  S(x0,t  )  if  h  =  1. 

iii)  The  upper  error  bounds  (1.3)  are  valid. 


and 


if 


2  llx  -  x  II 
n+1  n 


iv) 


r  (' 

,  W  .  4 

\  n  n-1 


t  _  -  t  > 
n+1  n 


<  Hx  -  x 


•  t  -  t  .  .  2 

v)  ,x  -  xp+1(  <  — - -  llx  -  xnll  . 

(t  -  t  ) 
n 

Also,  the  uniqueness  statement  (ii)  and  the  bounds  in  (iii),  (iv),  and  (v)  are 
best  possible. 


-5- 


Proof.  Consider  the  scaling 


(2.1) 


Gx  =  F '  ( Xn  )  ~ 1  Fx 


The  Banach  lemma  implies  that  G*(x)  is  invertible  for  every  x  c  S(x0,t  ) 
If  both  x  and  Hx  =  x  -  G'(x)-1Gx  are  in  S(x0/t*>  then 


HH(Hx)  -  Hx#  < 


K/2 


1  -  KllHx  -  xQ  H 


II  Hx  -  Xll 


The  sequence  C t^}  satisfies  the  conditions  llx^  -  xQ  II  <  t^  =  a,  tn-1  <  tn 


lim  tfi  =  t  ,  and 


(2.2) 


K/2 


t  -  t 
n+1  n 


1  -  Kt  ..  .2  * 

n  (t  -  t  , ) 
n  n-1 


An  induction  argument  shows  that  {x  }  exists  in  S(xn,t*)  and  that  the 

n  u 

hypotheses  of  the  lemma  hold.  We  thus  get  (i)  and  (iii).  Consideration  of 
the  simplified  Newton  method  yields  (ii).  betting  en  -  t*  -  tn,  we  have 

2 


(2.3) 


e0  =  t  '  en+1  2e  +  A 


*  *  * 


,  A  =  t  -  t 


(2.4) 


HG-tx^-1!!  <  -*>(tn)-1  ~K—  =  |-  • 

n  e 

n 


From  the  identity 


*n+1  “  *n  "  (x*  '  V  +  G'(xn)-1(Gx*  -  Gxn  -  G'(xn)(x*  -  xn)) 


a  mean-value  theorem,  and  (2.4),  we  get 


(2.5) 


'n+1  *  2  * 

— — -  II X  -  X  II  +  II X  -  X  II 

2  n  n 


lx  -  x  II  >  0 
n+1  n 


The  sharper  lower  bounds  in  (iv)  follow.  Use  llx  „  -  x  II  <  t  .  -  t 

n+ 1  n  n+ 1  n 


and 


6n+1  ^n+1 


2  2  ' 
e  (t  -  t  , ) 
n  n  n-1 


which  results  from  (2.2)  and  (2.4),  to  get  the  other  lower  bounds.  Use  (2.4) 
and  a  mean-value  theoren  on 


★  —1  ★  * 

llx  -  x  II  <  IIG *  (x  )  II  II  Gx  -  Gx  -  G'(x  )(x  -  x  )  II 

n+1  n  n  n  n 


to  get  (v).  Obtain  the  optimality  as  indicated  in  the  introduction  to 
complete  the  proof. 

The  bounds  in  the  theorem  are  expressed  in  terms  of  the  majorizing 
sequence,  but  since  Newton  iterates  for  quadratic  polynomials  are  known  in 
closed  form,  (8,  p,  28]  or  [17,  Appendix  F] ,  these  bounds  can  be  given 
explicitly. 

COROLLARY.  Assume  that  the  hypotheses  of  the  theorem  hold  and  let 

**  *  *  ** 

A  =  t  -  t  and  0  =  t  /t  .  Then 


2  llx  -  x  II 
n+1  n 


2  llx  -  x  II 
n+1  n 


_ <  llx 


1  +  /l  + 


49 


2n  2 

(1+0  ) 


i  +  A  +  ^ 
A 


—  llx  -  x  II 
,n  n+1  n 


1  +  0 


x  II  < 
n 


.2°  ,  n-1 

- - —  llx  -  x  ,11  <9 

A  n  n-1 


x  -  x  ,  II  < 
n  n-1 


AO 


—  •  llx,  -  x_  II 
,n  1  0 


a  ( 1  -  0  > 


if  h  <  1  ,  and 
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2(v2  -  1 )  Hit 


if  h  =  1  . 

Proof • 
h  =  1  then 

(2.6) 

If  h  <  1, 
solution  is 

(2.7) 

for  h  <  1 . 


-  x  I!  <2  a(/l  +  —  II x  -  x  II  -  1)  <  llx 
n+1  n  a  n+1  n 


?n-1  . 

: -  llx  -  x  jr  <  llx  -  x  ,  II  <  2_n+1  llx. 

a  n  n-1  n  n-1  1 


We  first  solve  the  nonlinear  difference  equation  (2.3) 
A  =  0  and 


en  =  2"n+1a 


let  u„  =  e  /(e  +  A),  thus  getting  u„  =  9,  u  .  =  u 

n  n  n  0  n+1r 

2n 

u  =  G  ,  and  so, 
n 


e  = 
n 


A0 


1  -  6 


Now  use  (2.6)  and  (2.7)  on 


-  t 


e  -  e 
n  n+1 


* 

t  *  t 


n-1 


6n-1  6n/  (t-t)  (e  -  e 

'  n  n-1  n-1  r 


e  +  A 
n 


,  etc 


to  obtain  the  desired  expressions 


3.  Remarks  anil  Numerical  Example.  We  point  o-.it  features  of  the  above 
version  of  the  Kantorovich  theorem. 

3.1.  The  theorem  is  affine  invariant  and  the  transformation  (2.1)  is  an 
optimal  scaling  [3]. 

3.2.  Statement  (iv)  qives  an  improvement  of  the  Gragg-Tapia  lower 
bounds,  since  the  latter  are  equivalent  to  the  left-most  bounds. 

3.3.  The  inequalities  in  (v)  show  that  the  majorizing  sequence  yields 

not  only  second  r-order  convergence,  but  the  stronger  second  q-order  as 

2 

well.  Indeed,  if  h  <  1  then  (2.3)  implies  that  lim  e  . /e  =  1/A  <  “. 

n+ 1  n 

3.4.  The  two  right-most  bounds  in  (1.3)  are  equivalent  to  the  upper 


bounds  of  Gragg-Tapia.  The  bounds  with  II x  -  x  I 

n  n- 1 


are  in  practice 


considerably  sharper. 

3.5.  The  recent  bounds  of  Potra-Ptak  [18],  obtained  by  nondiscrete 
induction,  become  in  our  notation, 


Y  (A/2 ,  d  .  )  <  II  x  -  x  II  <  5  (A/2 ,  d  ),  d  =  II  x  -  x  .11  , 

n+ 1  n  n  n  n  n—i 

2  2  2  2  V2  1/2  2  2  V2 
Y(s,t)=(s  +4t  +4t(s  +t)  )  -  (t  +  (s  +  t  )  )  , 


S(s,t)=(s2+t2)  -s 


These  bounds  are  sharper  than  those  of  Gragg-Tapia.  However,  it  turns  out 
*  2  2 

that  (t  -  t  )d  /(t  -  t  ,)  <  6(A/2,  d  ),  see  [11].  Numerical  experiment-' 

n  n  n  n-i  n 

also  indicate  that  the  sharper  lower  bounds  in  (iv)  are  finer  than 

Y ( A/2 ,  d  ). 
n+  i 

3.5.  In  practice,  the  user  should  employ  the  sharpest  bounds, 


(3.1) 


n+  1 

1  +  Si  +  4a 


n  n+1 


<  llx  -  x  II  <■  A  d2 
n  n  n 


The  following  recurrence  relations  are  convenient  for  programmed  computation: 


(3.2) 


(3.3) 


*  **  ~  1 

a0  =  (t  +  "  >  '  an+1 


2a 

n 

2  2  ' 
1  +  A  a 

n 


A  =  6/a,  A  =  A  (2  -  AA  )  . 

1  n+ 1  n  n 


Use  (2.3)  and  a  =  1/(2e  +  A)  to  verify  (3.2).  For  (3 

n  n 

192]  . 


3.7.  We 
stated  in  the 
xQ  =  1.3,  a  = 
upper  bounds , 


borrow  an  example  given  in  [IP].  The  table 
theorem  for  the  scalar  cubic  F(x)  =  (x3  - 
0.236095,  K  =  0.209727.  The  bounds  in  (3.1) 
are  seen  to  be  sharper. 


i 

i 

i 

i 


3),  see  [10,  p. 

lists  all  the  bounds 
1 ) ,  with 
,  especially  the 
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